The performance of LSTM models from basin to continental scales

Virtual presentation at the EGU General Assembly 2020 comparing LSTMs trained for each basin individually with a single LSTM trained for all basins together.

Abstract

Simulation accuracy among traditional hydrological models usually degrades significantly when going from single basin to regional scale. Hydrological models perform best when calibrated for specific basins, and do worse when a regional calibration scheme is used.

One reason for this is that these models do not (have to) learn hydrological processes from data. Rather, they have a predefined model structure and only a handful of parameters adapt to specific basins. This often yields less-than-optimal parameter values when the loss is not determined by a single basin, but by many through regional calibration.

The opposite is true for data driven approaches where models tend to get better with more and diverse training data. We examine whether this holds true when modeling rainfall-runoff processes with deep learning, or if, like their process-based counterparts, data-driven hydrological models degrade when going from basin to regional scale.

Recently, Kratzert et al. (2018) showed that the Long Short-Term Memory network (LSTM), a special type of recurrent neural network, achieves comparable performance to the SAC-SMA at basin scale. In follow up work Kratzert et al. (2019a) trained a single LSTM for hundreds of basins in the continental US, which outperformed a set of hydrological models significantly, even compared to basin-calibrated hydrological models. On average, a single LSTM is even better in out-of-sample predictions (ungauged) compared to the SAC-SMA in-sample (gauged) or US National Water Model (Kratzert et al. 2019b).

LSTM-based approaches usually involve tuning a large number of hyperparameters, such as the number of neurons, number of layers, and learning rate, that are critical for the predictive performance. Therefore, large-scale hyperparameter search has to be performed to obtain a proficient LSTM network.

However, in the abovementioned studies, hyperparameter optimization was not conducted at large scale and e.g. in Kratzert et al. (2018) the same network hyperparameters were used in all basins, instead of tuning hyperparameters for each basin separately. It is yet unclear whether LSTMs follow the same trend of traditional hydrological models to degrade performance from basin to regional scale.

In the current study, we performed a computational expensive, basin-specific hyperparameter search to explore how site-specific LSTMs differ in performance compared to regionally calibrated LSTMs. We compared our results to the mHM and VIC models, once calibrated per-basin and once using an MPR regionalization scheme. These benchmark models were calibrated individual research groups, to eliminate bias in our study. We analyse whether differences in basin-specific vs regional model performance can be linked to basin attributes or data set characteristics.

Links

Abstract

Slides

Citation

@inproceedings{kratzert2020basintocontinent,
  title={The performance of LSTM models from basin to continental scales},
  author={Kratzert, F. and Klotz, D. and Shalev, G. and Nevo, S. and Klambauer, G. and Nearing, G. and Hochreiter, S. },
  booktitle={EGU General Assembly 2020},
  venue={Vienna, Austria},
  date={4-8 May},
  year={2020}
}